Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38405958

RESUMEN

Background: The Human Proteome Project has credibly detected nearly 93% of the roughly 20,000 proteins which are predicted by the human genome. However, the proteome is enigmatic, where alterations in amino acid sequences from polymorphisms and alternative splicing, errors in translation, and post-translational modifications result in a proteome depth estimated at several million unique proteoforms. Recently mass spectrometry has been demonstrated in several landmark efforts mapping the human proteoform landscape in bulk analyses. Herein, we developed an integrated workflow for characterizing proteoforms from human tissue in a spatially resolved manner by coupling laser capture microdissection, nanoliter-scale sample preparation, and mass spectrometry imaging. Results: Using healthy human kidney sections as the case study, we focused our analyses on the major functional tissue units including glomeruli, tubules, and medullary rays. After laser capture microdissection, these isolated functional tissue units were processed with microPOTS (microdroplet processing in one-pot for trace samples) for sensitive top-down proteomics measurement. This provided a quantitative database of 616 proteoforms that was further leveraged as a library for mass spectrometry imaging with near-cellular spatial resolution over the entire section. Notably, several mitochondrial proteoforms were found to be differentially abundant between glomeruli and convoluted tubules, and further spatial contextualization was provided by mass spectrometry imaging confirming unique differences identified by microPOTS, and further expanding the field-of-view for unique distributions such as enhanced abundance of a truncated form (1-74) of ubiquitin within cortical regions. Conclusions: We developed an integrated workflow to directly identify proteoforms and reveal their spatial distributions. Where of the 20 differentially abundant proteoforms identified as discriminate between tubules and glomeruli by microPOTS, the vast majority of tubular proteoforms were of mitochondrial origin (8 of 10) where discriminate proteoforms in glomeruli were primarily hemoglobin subunits (9 of 10). These trends were also identified within ion images demonstrating spatially resolved characterization of proteoforms that has the potential to reshape discovery-based proteomics because the proteoforms are the ultimate effector of cellular functions. Applications of this technology have the potential to unravel etiology and pathophysiology of disease states, informing on biologically active proteoforms, which remodel the proteomic landscape in chronic and acute disorders.

2.
J Proteome Res ; 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38421884

RESUMEN

Proteoforms, the different forms of a protein with sequence variations including post-translational modifications (PTMs), execute vital functions in biological systems, such as cell signaling and epigenetic regulation. Advances in top-down mass spectrometry (MS) technology have permitted the direct characterization of intact proteoforms and their exact number of modification sites, allowing for the relative quantification of positional isomers (PI). Protein positional isomers refer to a set of proteoforms with identical total mass and set of modifications, but varying PTM site combinations. The relative abundance of PI can be estimated by matching proteoform-specific fragment ions to top-down tandem MS (MS2) data to localize and quantify modifications. However, the current approaches heavily rely on manual annotation. Here, we present IsoForma, an open-source R package for the relative quantification of PI within a single tool. Benchmarking IsoForma's performance against two existing workflows produced comparable results and improvements in speed. Overall, IsoForma provides a streamlined process for quantifying PI, reduces the analysis time, and offers an essential framework for developing customized proteoform analysis workflows. The software is open source and available at https://github.com/EMSL-Computing/isoforma-lib.

3.
Pac Symp Biocomput ; 29: 170-186, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160278

RESUMEN

Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.


Asunto(s)
Monitoreo del Ambiente , Dispositivos Electrónicos Vestibles , Humanos , Biología Computacional , Análisis de Datos , Monitoreo del Ambiente/métodos , Siliconas/química
4.
J Proteome Res ; 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38085827

RESUMEN

PMart is a web-based tool for reproducible quality control, exploratory data analysis, statistical analysis, and interactive visualization of 'omics data, based on the functionality of the pmartR R package. The newly improved user interface supports more 'omics data types, additional statistical capabilities, and enhanced options for creating downloadable graphics. PMart supports the analysis of label-free and isobaric-labeled (e.g., TMT, iTRAQ) proteomics, nuclear magnetic resonance (NMR) and mass-spectrometry (MS)-based metabolomics, MS-based lipidomics, and ribonucleic acid sequencing (RNA-seq) transcriptomics data. At the end of a PMart session, a report is available that summarizes the processing steps performed and includes the pmartR R package functions used to execute the data processing. In addition, built-in safeguards in the backend code prevent users from utilizing methods that are inappropriate based on omics data type. PMart is a user-friendly interface for conducting exploratory data analysis and statistical comparisons of omics data without programming.

5.
bioRxiv ; 2023 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-37873084

RESUMEN

Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.

6.
Metabolites ; 13(10)2023 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-37887426

RESUMEN

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this "molecular snapshot" is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.

7.
J Am Soc Mass Spectrom ; 34(9): 2061-2064, 2023 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-37523489

RESUMEN

Due to its speed, accuracy, and adaptability to various sample types, matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has become a popular method to identify molecular isotope profiles from biological samples. Often MALDI-MS data do not include tandem MS fragmentation data, and thus the identification of compounds in samples requires external databases so that the accurate mass of detected signals can be matched to known molecular compounds. Most relevant MALDI-MS software tools developed to confirm compound identifications are focused on small molecules (e.g., metabolites, lipids) and cannot be easily adapted to protein data due to their more complex isotopic distributions. Here, we present an R package called IsoMatchMS for the automated annotation of MALDI-MS data for multiple datatypes: intact proteins, peptides, and glycans. This tool accepts already derived molecular formulas or, for proteomics applications, can derive molecular formulas from a list of input peptides or proteins including proteins with post-translational modifications. Visualization of all matched isotopic profiles is provided in a highly accessible HTML format called a trelliscope display, which allows users to filter and sort by several parameters such as match scores and the number of peaks matched. IsoMatchMS simplifies the annotation and visualization of MALDI-MS data for downstream analyses.


Asunto(s)
Proteínas , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Proteínas/química , Péptidos , Proteómica/métodos
8.
Anal Chem ; 95(19): 7536-7544, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37129113

RESUMEN

As metabolomics grows into a high-throughput and high demand research field, current metrics for the identification of small molecules in gas chromatography-mass spectrometry (GC-MS) still require manual verification. Though steps have been taken to improve scoring metrics by combining spectral similarity (SS) and retention index (RI), the problem persists. A large body of literature has analyzed and refined SS scores, but few studies have explicitly studied improvements to RI scores. Here, we examined whether uninvestigated assumptions of the RI score are valid and propose ways to improve them. Query RIs were matched to library RI with a generous window of ±35 to avoid unintentional removal of valid compound identifications. Each match was manually verified as a true positive (TP), true negative, or unknown. Metabolites with at least 30 TP identifications were included in downstream analyses, resulting in a total of 87 metabolites from samples of varying complexity and type (e.g., amino acid mixtures, human urine, fungal species, and so on.). Our results showed that the RI score assumptions of normality, consistent variance across metabolites, and a mean error centered at 0 are often violated. We demonstrated through a cross-validation analysis that modifying these underlying assumptions according to empirical metabolite-specific distributions improved the TP and negative rankings. Further, we statistically determined the minimum number of samples required to estimate distributional parameters for scoring metrics. Overall, this work proposes a robust statistical pipeline to reduce the time bottleneck of metabolite identification by improving RI scores and thus minimize the effort to complete manual verification.


Asunto(s)
Metabolómica , Humanos , Cromatografía de Gases y Espectrometría de Masas/métodos , Metabolómica/métodos
9.
J Am Soc Mass Spectrom ; 34(6): 1096-1104, 2023 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-37084380

RESUMEN

The ability to reliably identify small molecules (e.g., metabolites) is key toward driving scientific advancement in metabolomics. Gas chromatography-mass spectrometry (GC-MS) is an analytic method that may be applied to facilitate this process. The typical GC-MS identification workflow involves quantifying the similarity of an observed sample spectrum and other features (e.g., retention index) to that of several references, noting the compound of the best-matching reference spectrum as the identified metabolite. While a deluge of similarity metrics exist, none quantify the error rate of generated identifications, thereby presenting an unknown risk of false identification or discovery. To quantify this unknown risk, we propose a model-based framework for estimating the false discovery rate (FDR) among a set of identifications. Extending a traditional mixture modeling framework, our method incorporates both similarity score and experimental information in estimating the FDR. We apply these models to identification lists derived from across 548 samples of varying complexity and sample type (e.g., fungal species, standard mixtures, etc.), comparing their performance to that of the traditional Gaussian mixture model (GMM). Through simulation, we additionally assess the impact of reference library size on the accuracy of FDR estimates. In comparing the best performing model extensions to the GMM, our results indicate relative decreases in median absolute estimation error (MAE) ranging from 12% to 70%, based on comparisons of the median MAEs across all hit-lists. Results indicate that these relative performance improvements generally hold despite library size; however FDR estimation error typically worsens as the set of reference compounds diminishes.


Asunto(s)
Metabolómica , Cromatografía de Gases y Espectrometría de Masas/métodos , Metabolómica/métodos
10.
Sci Data ; 10(1): 151, 2023 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-36944655

RESUMEN

The OSU/PNNL Superfund Research Program (SRP) represents a longstanding collaboration to quantify Polycyclic Aromatic Hydrocarbons (PAHs) at various superfund sites in the Pacific Northwest and assess their potential impact on human health. To link the chemical measurements to biological activity, we describe the use of the zebrafish as a high-throughput developmental toxicity model that provides quantitative measurements of the exposure to chemicals. Toward this end, we have linked over 150 PAHs found at Superfund sites to the effect of these same chemicals in zebrafish, creating a rich dataset that links environmental exposure to biological response. To quantify this response, we have implemented a dose-response modelling pipeline to calculate benchmark dose parameters which enable potency comparison across over 500 chemicals and 12 of the phenotypes measured in zebrafish. We provide a rich dataset for download and analysis as well as a web portal that provides public access to this dataset via an interactive web site designed to support exploration and re-use of these data by the scientific community at http://srp.pnnl.gov .


Asunto(s)
Exposición a Riesgos Ambientales , Hidrocarburos Policíclicos Aromáticos , Pez Cebra , Animales , Humanos , Exposición a Riesgos Ambientales/análisis , Sustancias Peligrosas/análisis , Noroeste de Estados Unidos , Hidrocarburos Policíclicos Aromáticos/toxicidad , Hidrocarburos Policíclicos Aromáticos/análisis
11.
J Proteome Res ; 22(2): 570-576, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36622218

RESUMEN

The pmartR (https://github.com/pmartR/pmartR) package was designed for the quality control (QC) and analysis of mass spectrometry data, tailored to specific characteristics of proteomic (isobaric or labeled), metabolomic, and lipidomic data sets. Since its initial release, the tool has been expanded to address the needs of its growing userbase and now includes QC and statistics for nuclear magnetic resonance metabolomic data, and leverages the DESeq2, edgeR, and limma-voom R packages for transcriptomic data analyses. These improvements have made progress toward a unified omics processing pipeline for ease of reporting and streamlined statistical purposes. The package's statistics and visualization capabilities have also been expanded by adding support for paired data and by integrating pmartR with the trelliscopejs R package for the quick creation of trellis displays (https://github.com/hafen/trelliscopejs). Here, we present relevant examples of each of these enhancements to pmartR and highlight how each new feature benefits the omics community.


Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Metabolómica/métodos , Perfilación de la Expresión Génica/métodos , Control de Calidad
12.
Mol Cell Proteomics ; 22(2): 100491, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36603806

RESUMEN

Conventional proteomic approaches measure the averaged signal from mixed cell populations or bulk tissues, leading to the dilution of signals arising from subpopulations of cells that might serve as important biomarkers. Recent developments in bottom-up proteomics have enabled spatial mapping of cellular heterogeneity in tissue microenvironments. However, bottom-up proteomics cannot unambiguously define and quantify proteoforms, which are intact (i.e., functional) forms of proteins capturing genetic variations, alternatively spliced transcripts and posttranslational modifications. Herein, we described a spatially resolved top-down proteomics (TDP) platform for proteoform identification and quantitation directly from tissue sections. The spatial TDP platform consisted of a nanodroplet processing in one pot for trace samples-based sample preparation system and an laser capture microdissection-based cell isolation system. We improved the nanodroplet processing in one pot for trace samples sample preparation by adding benzonase in the extraction buffer to enhance the coverage of nucleus proteins. Using ∼200 cultured cells as test samples, this approach increased total proteoform identifications from 493 to 700; with newly identified proteoforms primarily corresponding to nuclear proteins. To demonstrate the spatial TDP platform in tissue samples, we analyzed laser capture microdissection-isolated tissue voxels from rat brain cortex and hypothalamus regions. We quantified 509 proteoforms within the union of top-down mass spectrometry-based proteoform identification and characterization and TDPortal identifications to match with features from protein mass extractor. Several proteoforms corresponding to the same gene exhibited mixed abundance profiles between two tissue regions, suggesting potential posttranslational modification-specific spatial distributions. The spatial TDP workflow has prospects for biomarker discovery at proteoform level from small tissue sections.


Asunto(s)
Proteoma , Proteómica , Proteoma/metabolismo , Microfluídica , Espectrometría de Masas , Proteínas de Unión al ADN
13.
J Proteome Res ; 20(4): 2014-2020, 2021 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-33661636

RESUMEN

Visual examination of mass spectrometry data is necessary to assess data quality and to facilitate data exploration. Graphics provide the means to evaluate spectral properties, test alternative peptide/protein sequence matches, prepare annotated spectra for publication, and fine-tune parameters during wet lab procedures. Visual inspection of LC-MS data is constrained by proteomics visualization software designed for particular workflows or vendor-specific tools without open-source code. We built PSpecteR, an open-source and interactive R Shiny web application for visualization of LC-MS data, with support for several steps of proteomics data processing, including reading various mass spectrometry files, running open-source database search engines, labeling spectra with fragmentation patterns, testing post-translational modifications, plotting where identified fragments map to reference sequences, and visualizing algorithmic output and metadata. All figures, tables, and spectra are exportable within one easy-to-use graphical user interface. Our current software provides a flexible and modern R framework to support fast implementation of additional features. The open-source code is readily available (https://github.com/EMSL-Computing/PSpecteR), and a PSpecteR Docker container (https://hub.docker.com/r/emslcomputing/pspecter) is available for easy local installation.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Cromatografía Liquida , Proteínas , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...